Add 24 compressor #167

rahul-tuli · 2024-09-26T15:42:35Z

This PR introduces the Sparse24Compressor, designed for 2:4 sparse models. The implementation is based on #182 and corresponds to Part 3 of the [Design Document](https://www.notion.so/Design-Document-24-Compressor-25ac643aee604c298f2bb12a6c220861?pvs=4).

Key Changes

New Feature: Implementation of Sparse24Compressor for handling 2:4 sparsity in models.
Testing: validation, including support and tests for torch.float8e4m3 dtype.

Class Hierarchy

The Sparse24Compressor follows the established compressor class hierarchy:

BaseCompressor (Abstract Class)
    |
    +-- BaseSparsityCompressor (Abstract Class)
           |
           +-- Sparse24Compressor

File Structure

The Sparse24Compressor and associated logic are placed within the sparse_compressors module:

compressors/
└── sparse_compressors/
    ├── __init__.py
    ├── base.py                 <-- Contains BaseSparsityCompressor
    ├── dense.py
    ├── sparse_bitmask.py
    └── sparse24.py             <-- New file for Sparse24Compressor

Click to expand Verification Methodology

The `Sparse24Compressor` was tested using a comprehensive script that validates its behavior through the following steps: 1. **Load Model**: An uncompressed model is loaded from the Hugging Face model hub or a local directory. 2. **Compression**: The model is compressed using `ModelCompressor`, and the compressed version is saved. 3. **Decompression**: A new base model is initialized, and the compressed weights are decompressed using `ModelCompressor.decompress`. 4. **Parameter Validation**: Parameters in the decompressed model are verified to match the original uncompressed model. 5. **Inference Check**: The decompressed model is used to generate text, ensuring correctness and functionality.

Click to expand the Verification Script

import torch
from transformers import AutoModelForCausalLM
from compressed_tensors.compressors import ModelCompressor
from transformers import AutoTokenizer
from llmcompressor.transformers import oneshot
from compressed_tensors.config import Sparse24Config

# Load uncompressed model
hf_model_stub = "nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned_50.2of4-uncompressed"
uncompressed_model = AutoModelForCausalLM.from_pretrained(hf_model_stub, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(hf_model_stub)

# Compress the model using Sparse24Compressor
oneshot(model=uncompressed_model)
compressed_save_dir = "temp-model"
sparsity_config = Sparse24Config(targets=["Linear"], ignore=["lm_head"])
uncompressed_model.save_pretrained(save_directory=compressed_save_dir, sparsity_config=sparsity_config)
tokenizer.save_pretrained(save_directory=compressed_save_dir)

# Decompress the model
base_stub = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
decompressed_model = AutoModelForCausalLM.from_pretrained(base_stub, torch_dtype="auto", device_map="auto")
compressor = ModelCompressor.from_pretrained(compressed_save_dir)
compressor.decompress(model_path=compressed_save_dir, model=decompressed_model)

# Verify parameters match
decompressed_state_dict = decompressed_model.state_dict()
uncompressed_state_dict = uncompressed_model.state_dict()

for key in decompressed_state_dict.keys():
    assert key in uncompressed_state_dict.keys()
    decompressed_tensor = decompressed_state_dict[key]
    uncompressed_tensor = uncompressed_state_dict[key]
    assert torch.equal(decompressed_tensor, uncompressed_tensor), f"Tensor {key} mismatch."

print("All parameters match the original model.")
print("Inference on the decompressed model:")

# Inference check
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = decompressed_model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))

Click to expand the sample output generation from decompressed model

All parameters match the original model.
Inference on the decompressed model:

========== SAMPLE GENERATION ==============
<s> Hello my name is John. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California. I am a student at the University of California.
==========================================

Note: the fp8 test can only run on GPU's with cuda capability > 90
Proof that it passes on the right device:

(.venv) ➜  compressed-tensors git:(add-24-compressor) ✗ pytest tests/test_utils/test_semi_structured_conversions.py
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/rahul/compressed-tensors
configfile: pyproject.toml
collected 4 items                                                                                                                          

tests/test_utils/test_semi_structured_conversions.py ....                                                                            [100%]

============================================================ 4 passed in 1.51s =============================================================

horheynm

Very clean.
lgtm after tests
!!

The base branch was changed.

markurtz

Overall code looks simple. I'd like to reformulate the scope, though. Specifically, I'm not following why we are restricting to just 2:4 right now when we could easily expand this to handle all sparsity cases and detect whether it is 2:4 format, some type of structured pruning, and if not any then set as unstructured. cc @dsikka

dsikka

testing?

…eater than 90

rahul-tuli changed the base branch from main to update-folder-structure-compressors September 26, 2024 15:42

rahul-tuli force-pushed the add-24-compressor branch from 2015e71 to dea129e Compare September 26, 2024 15:45

rahul-tuli mentioned this pull request Sep 26, 2024

Add Sparse24Compressor #129

Closed

horheynm previously approved these changes Sep 26, 2024

View reviewed changes

rahul-tuli force-pushed the update-folder-structure-compressors branch 2 times, most recently from 2f69d16 to fc4b23c Compare October 2, 2024 20:56

rahul-tuli force-pushed the add-24-compressor branch from dea129e to 68ca6c3 Compare October 2, 2024 20:58

rahul-tuli force-pushed the update-folder-structure-compressors branch 2 times, most recently from dd16499 to 7155e61 Compare October 2, 2024 21:06

rahul-tuli force-pushed the add-24-compressor branch from 68ca6c3 to 6636872 Compare October 2, 2024 21:08

Base automatically changed from update-folder-structure-compressors to main October 3, 2024 00:43

mgoin mentioned this pull request Oct 4, 2024

[WIP] Example for 2:4 sparsity with w8a8 vllm-project/llm-compressor#775

Closed

markurtz reviewed Oct 18, 2024

View reviewed changes

markurtz mentioned this pull request Oct 18, 2024

No model size reduction seen vllm-project/llm-compressor#790

Open

dsikka requested changes Oct 25, 2024

View reviewed changes

Add: Sparse24_compressor + tests

3a6ccc8

rahul-tuli force-pushed the add-24-compressor branch from e56bf72 to 3a6ccc8 Compare November 27, 2024 13:57

rahul-tuli changed the base branch from main to add-targets-and-ignore-support November 27, 2024 13:58

rahul-tuli requested a review from dsikka November 27, 2024 14:17

Run float8 test only if cuda is available and device capability is gr…

8fd469f

…eater than 90

rahul-tuli force-pushed the add-24-compressor branch from 74d6498 to 8fd469f Compare November 27, 2024 14:18

rahul-tuli mentioned this pull request Nov 27, 2024

Bump version to v0.8.1 #216

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 24 compressor #167

Add 24 compressor #167

rahul-tuli commented Sep 26, 2024 •

edited

Loading

horheynm left a comment

markurtz left a comment

dsikka left a comment

Add 24 compressor #167

Are you sure you want to change the base?

Add 24 compressor #167

Conversation

rahul-tuli commented Sep 26, 2024 • edited Loading

Key Changes

Class Hierarchy

File Structure

horheynm left a comment

Choose a reason for hiding this comment

markurtz left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

rahul-tuli commented Sep 26, 2024 •

edited

Loading